A cost issue in ORDER BY + LIMIT

  • Jump to comment-1
    paulguo@gmail.com2022-08-06T15:38:17+00:00
    Hello, Postgres seems to always optimize ORDER BY + LIMIT as top-k sort. Recently I happened to notice that in this scenario the output tuple number of the sort node is not the same as the LIMIT tuple number. See below, postgres=# explain analyze verbose select * from t1 order by a limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------- ------------------------------ Limit (cost=69446.17..69446.20 rows=10 width=4) (actual time=282.896..282.923 rows=10 loops=1) Output: a -> Sort (cost=69446.17..71925.83 rows=991862 width=4) (actual time=282.894..282.896 rows=10 l oops=1) Output: a Sort Key: t1.a Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on public.t1 (cost=0.00..44649.62 rows=991862 width=4) (actual time=0.026.. 195.438 rows=1001000 loops=1) Output: a Planning Time: 0.553 ms Execution Time: 282.961 ms (10 rows) We can see from the output that the LIMIT cost is wrong also due to this since it assumes the input_rows from the sort node is 991862 (per gdb debugging). This could be easily fixed by below patch, diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index fb28e6411a..800cf0b256 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -2429,7 +2429,11 @@ cost_sort(Path *path, PlannerInfo *root, startup_cost += input_cost; - path->rows = tuples; + if (limit_tuples > 0 && limit_tuples < tuples) + path->rows = limit_tuples; + else + path->rows = tuples; + path->startup_cost = startup_cost; path->total_cost = startup_cost + run_cost; } Withe the patch the explain output looks like this. postgres=# explain analyze verbose select * from t1 order by a limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------- ------------------------------ Limit (cost=69446.17..71925.83 rows=10 width=4) (actual time=256.204..256.207 rows=10 loops=1) Output: a -> Sort (cost=69446.17..71925.83 rows=10 width=4) (actual time=256.202..256.203 rows=10 loops =1) Output: a Sort Key: t1.a Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on public.t1 (cost=0.00..44649.62 rows=991862 width=4) (actual time=1.014.. 169.509 rows=1001000 loops=1) Output: a Planning Time: 0.076 ms Execution Time: 256.232 ms (10 rows) Regards, Paul
    • Jump to comment-1
      tgl@sss.pgh.pa.us2022-08-06T16:12:56+00:00
      Paul Guo <paulguo@gmail.com> writes: > Postgres seems to always optimize ORDER BY + LIMIT as top-k sort. > Recently I happened to notice > that in this scenario the output tuple number of the sort node is not > the same as the LIMIT tuple number. No, it isn't, and your proposed patch is completely misguided. The cost and rowcount estimates for a plan node are always written on the assumption that the node is run to completion. regards, tom lane
    • Jump to comment-1
      zmlpostgres@gmail.com2022-08-06T15:48:55+00:00
      HI, What if the the rows of t1 is less than the limit number(ex: t1 has 5 rows, limit 10)? Does it matter? Regards, Zhang Mingli On Aug 6, 2022, 23:38 +0800, Paul Guo <paulguo@gmail.com>, wrote: > > limit_tuples